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1.  Introduction 

The  classical  version  of  Newton's  method  for  minimizing  a twice 
continuously  function  f:  En-HL  takes  the  form: 


\+l  = \ ~ 72f(xk)"lvf(*k) 


(1.1) 


where  (Vzf(x^)  denotes  the  a by  n Hessian  matrix  of  second  derivatives, 

and  Vf(x^)  denotes  the  n by  1 gradient  vector  of  first  derivatives 

2 

evaluated  at  x^  . Near  a point  x*  where  Vf(x*)  = 0 and  V f(x*)  is 

positive  definite  this  algorithm  converges  with  a rate  that  is  at  least 
quadratic  (when  V^f  satisfies  a Lipschitz  condition).  When  x^  is  started 

far  away  from  such  a point,  the  algorithm  may  not  converge  and  if  it  does, 
it  may  not  be  to  a local  unconstrained  minimizer.  Modifications  must  be  made 
to  handle  the  cases  when  V f(x^)  Is  indefinite  or  singular.  It  is  usually 

considered  desirable  to  modify  the  algorithm  so  that  it  is  a descent  method 
(i.e.,  f(x^+^)  < f(x^)  for  aH  k)»  and  so  that  accumulation  points  of  the 

sequence  {x^}  are  stationary  points  with  Hessian  matrices  which  are  positive 
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There  are  two  separate  issues.  One  is  to  develop  strategies  for 
situations  where  (1.1)  does  not  apply,  the  other  is  to  provide  a numerically 
stable  matrix  method  to  compute  the  quantities  required  by  the  various  stra- 
tegies. In  this  paper  three  strategies  are  presented,  and  a matrix  method  is 
developed  for  implementing  them.  The  matrix  method  differs  from  the  tradi- 
tional ones  In  that  it  operates  on  the  Hessian  matrix  assuming  that  it  is 
given  as  the  sum  of  matrices  of  rank  one  (outer  product  matrices)  rather 
than  in  the  square  symmetric  form  that  the  usual  techniques  are  applied  to. 
This  is  shown  to  be  a natural  byproduct  of  the  factorable  programming  point 
of  view  which  is  defined  in  Section  2.  The  matrix  technique  is  combined 
with  a factorable  programming  language  (McCormick  1974)  and  the  SUMT  algorithm 
(Fiacco  and  McCormick  1968)  for  solving  general  nonlinear  programs.  Then 
the  three  strategies  are  evaluated  on  several  test  problems. 

2.  Definitions 

Most  functions  of  several  variables  used  in  nonlinear  optimization 
are  complicated  compositions  of  transformed  sums  and  products  of  functions 
of  a single  variable.  Once  this  observation  is  made  formal,  an  outline  of 
an  automatic  computer-oriented  way  of  representing  nonlinear  programming 
problems  becomes  clear. 

Definition"  A function  f(x,,...,x  ) is  a factorable  function  of 

1 n 

several  variables  if  it  can  be  represented  as  the  last  in  a finite  sequence 
of  functions  (fj(x)}  which  are  composed  as  follows: 

f j (x)  = Xj  , for  j=l,...,n  . — « 

For  j > n , f j (x)  is  either  of  the  form 

ffc(x)  + f ^ (x)  , k, it  < j , 

or  of  the  form 

fk(x>  * » k,2.  < j > 

or  of  the  form 

T[fk(x)]  , k < j , 

where  T(f)  is  a function  of  a single  variable. 
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Ic  should  be  emphasized  that  this  is  a natural  way  of  looking  at 
functions  of  several  variables  since  it  corresponds  to  the  way  they  are 
used  when  evaluated  at  a particular  set  of  values  of  the  variable. 

When  a function  is  represented  in  factorable  form,  the  computation 
of  its  first  and  second  derivatives  is  very  simple.  Consider  the  general 
form  for  the  first  and  second  derivatives  of  a function  which  is  either 
the  sum  of  two  functions,  the  product  of  two  functions,  or  a transformation 
(function  of  one  variable)  of  a function.  (The  derivative  is  f'(x)  , a 

O 

one  by  n vector,  and  f"(x)  is  identical  with  V f(x))  . 

For  the  derivative, 

[f(x)+g(x)]'  = f'(x)  + g'(x)  , 

[f(x)*g(x)]’  = f(x)  • g'(x)  + g(x)  • f 1 (x)  , (2.1) 

and 

T'[f(x)]  = T[f(x)]f'(x)  , (2.2) 

where  T(f)  = 3T(f)/3f  . 

For  the  second  derivative, 

[f(x)+g(x)]"  = f"(x)  + g"(x)  , 

[f(x)*g(x)]"  = f,f(x)  • g(x)  + g”(x)  • f(x)  , (2.3) 

+ f(x)T  • g’(x)T  + g (x)T  • f * (x)  , 

and 

[T"[f(x)]]  = f”(x)  • T[f (x) ] + f'(x)TT[f(x)]f'(x)  , (2.4) 

where  T(f)  = 32T(f)/3f2  . 

Several  points  should  be  noted. 

(i)  The  computation  of  (2.1)  involves  f(x),g(x)  which  would 
already  have  been  computed  in  order  to  evaluate  their 
product. 

(il)  The  computation  of  (2.3)  involves  f'(x),g'(x)  , which 
would  already  have  been  computed  in  order  to  evaluate 
(2.1). 

(iii)  Equation  (2.4)  uses  T[f(x)]  and  f'(x)  . which  were 
previously  needed  in  (2.2). 
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(iv)  By  induction,  it  follows  that  Hessian  matrices  of 

factorable  functions  are  naturally  given  as  sums  of 
outer  products  ( dyads ) of  vectors,  i.e.. 


^u±(x)ai(x)v^(x)  + vi(x)ai(x)u^(x)  , 

where  {u^(x) } .{v^x) } are  n x 1 vectors,  {^(x)}  are 
scalars,  and  the  (u^x) } ,{v^(x) } are  available,  having 
been  required  for  the  derivative  computation. 


Because  of  the  identity 


abT  + baT  = (a+b) (1/2) (a+b)T  + (a-b)  (-1/2) (a-b)T 


it  can  be  assumed  without  loss  of  generality  that  the  Hessian  matrix  of  a 
factorable  function  is  given  in  the  font 


F = V2f(x)  *■  ^ai(x)ci(x)ai(x)T  . 


For  optimization  problems  which  are  factorable,  the  factorable 
nonlinear  programming  problem  is  written  in  more  compact  form: 


minimize  f„(x) 
N 

_n 

x e E 


subject  to 


Li  1 f±(x)  1 U± 


for  i =>  1,...,N-1  (possibly  = - 00  and/or  = °°)  , where 

f^(x)  * x^  , for  i =l,...,n  , and  the  remainder  are  defined  recursively 


as 


follows:  given  {fp(x)>  for  p * l,...,i-l  , then  for  i = n+l,...,N  , 


where  the  T’s,  U’s,  and  V's  are  functions  of  a single  variable. 


A computer  program  (McCormick  1974)  has  been  written  which  accepts 
the  factorable  problem  as  coded  on  cards  and  provides  the  interface  with 
nonlinear  programming  algorithms.  A symbolic  version  of  this  was  used  in 
this  study. 
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A direction  s^  is  called  a descent  direction  at  the  point  x^  if 

sk  Vf(xk}  0 * 
and  a nonascent  direction  if 

sT  Vf^)  <_  0 . 

A direction  d^  is  called  a direction  of  negative  curvature  if 

4 ,2£‘V  ■‘ki0  • 


and  a direction  of  nonpositive  curvature  if 

dk  i 0 • 


For  a matrix  A , the  generalized  inverse , denoted  by  A is  the 
unique  matrix  satisfying  the  four  relations: 

AA  A = A,  A AA  = A+  , (AA+  = AA+  , and  (A+A)^  = a"*"a  . 

Consider  the  eigenvector  eigenvalue  reduction  of  a symmetric  matrix  A 


A = E A E 
T 

where  EE  * I , and  A is  a diagonal  matrix  of  eigenvalues.  The  positive 
part  of  A denoted  by  p is 


p * hi>o  eixie5 


where  is  the  ith  column  of  E . 

The  positive  part  can  be  shown  to  be  invariant  to  the  particular 
eigenvector-eigenvalue  reduction  used. 

A symmetric  matrix  A is  called  positive  semi-definite  singular  (PSDS) 
if  it  is  positive  semi-definite  and  has  at  least  one  zero  eigenvalue. 
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The  orthogonal  matrix  given  by 


/cor'  -sin0' 


G = 


^sin0  cos0y 

has  the  effect  of  rotating  the  axes  through  an  angle  -0  and  is  called  a 
Givens  matrix.  Consider  point  (X^,Y^)  in  the  X-Y  plane  coordinate  of 
Figure  1. 


By  proper  choice  of  the  angle  0,  point  (X^,Y^)  can  be  transformed  to 
(X^,0).  A Givens  matrix  can  be  used  to  achieve  this  type  of  linear 
transformation.  Consider  the  linear  transformation 


cos0  -sinb 


sin0  cos0 


The  desired  transformation  can  be  achieved  by  letting 

X, 


0 ■ cos 


-1 
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A Givens  matrix  is  usually  used  to  transform  the  n-vector  W 


W = I W 


where  is  a t x 1 vector,  is  a scalar  and  is  (n-t-1)  x 1 vector 


to  the  general  form  of 


• where  Wj  is  a scalar. 


This  transformation  can  be  achieved  through  a sequence  of  orthogonal 
transformations  which  places  a zero  successively  in  the  last  n-t-1  ele- 
ments of  W.  In  order  to  find  the  reduced  form  of  W we  must  embed  the 
2x2  Givens  matrix  in  the  n x n identity  matrix. 

Let  G*’^  denote  the  matrix  which  rotates  the  i,jth  axes  through 
an  angle  -0,  i.e.,  reduces  w^  (jth  element  of  W)  to  zero  by  a linear 
combination  of  this  element  with  w^  (ith  element  of  W) . Let  C = cos9 
and  S = sin0.  Then  G*’^  is  given  by 


1 

C 0 


where  a blank  space  represents  zero,  C = 


and  S 


FFF  F + “j 


Here  we  show  a sequence  cf  Givens  matrices  which  if  multiplied  by  W 
achieve  the  derived  reduction: 

W, 


n i-1  i 

= n G ’ W 


i~t+2 


Consider  the  vector  W = J 6 ] . Through  use  of  Givens  matrices 

3 


it  is  desired  to  transform  W to  W - | Wj  J , where  as  before  is  a 

0 


scalar  to  be  determined.  Then 


H* 

o 

o 

o 

o 

/ X 

2 

2 ' 

0 10  0 0 

8 

8 

0 0 10  0 

6 

- 

6 

ooo  — 

3 

34 

/34  /34 

/34 

-5  3 

0 0 0 — ~ 

5 

0 

k /i4  ^34. 
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Next, 

W = G3,4  Wx 


^ / 

\ 

1 0 0 0 0 

2 

2 

0 10  0 0 

8 

8 

0 0 — 17  0 

6 

70 

/70  t/595 

/ 70 

0 o _6_  o 

34 

0 

/595  /70 

/34 

0 0 0 0 1 

0 

0 

k / 

v / 

/ 

In  Givens  transformation  we  noted  that  it  required  n-t-1  itera- 
tions in  order  to  place  a zero  in  the  last  n-t-1  elements  of  W.  House- 
holder proposed  another  transformation  whereby  the  entire  n-t-1  elements 
are  changed  to  zero  at  once.  This  procedure  is  much  faster  than  the 
Givens  transformation,  and  it  is  preferred  whenever  possible.  It  is 


desired  to  find  a Householder  matrix  H which  can  reduce  W to  W = 


The  Householder  matrix  is: 


/o  \ (e)"1  (oT,  n,  Wj) 


T-384 


where  n “ W2  +/w2  + | |W3|  |2  , e = n *^2  + I ^ ^ and 
— / 2 2 

W2  ■ - /w2  + | |W  I | ; As  an  example  consider  the  Householder  trans- 

f'\  . n\ 

formation  required  to  transform  W=l6jtoW  = |W2j  . This  is  given  by 

\l/  \°oJ 

n = 6 + /36  + (9+25)  = 6 + /70 
e * (6  + /?0)  x /TO  - 6/fo  + 70 


H 


I - 


' 0 N 

0 

6+/70 
3 
5 

/ 


W2  - - /to 


(6/70  + 70)-1  (0  0 6+/70  3 5) 


3.  Updating  UUT  Factorizations 

Givens  and  Householder  transformations  were  originally  used  to 
transform  a symmetric  matrix  to  a symmetric  tridiagonal  matrix  in  order  to 
find  its  eigenvalues.  These  orthogonal  transformations  are  now  being  used 
in  a wide  variety  of  numerical  analysis  problems.  Here,  we  present  a method 

T „ 

for  modifying  a factorization  of  the  form  W *»  UU  \U  is  a t x t upper 
triangular  matrix)  to  a prespecified  form  when  it  perturbed  by  a dyad. 
These  updating  procedures  will  be  used  in  Section  4 to  introduce  a new 
matrix  factorization  which  is  based  on  the  work  of  Bennett  and  Green  (1966). 
Here  we  will  consider  only  three  cases  which  will  arise  in  the  process 
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of  updating.  In  Section  4 we  present  the  criteria  applicable  to  each 
of  these  cases. 


Case  1 


T T 

Let  W ■ UU  be  perturbed  by  a dyad  (aca  ) that  does  not  change 


- T T 

the  rank  of  W = UU  + aca  from  W (where  U is  a t x t upper  triangular 

matrix,  a is  a t x 1 vector  and  c is  a scalar  which  could  be  either 

- T T 

positive  or  negative) . It  is  desired  to  factorize  W = UU  + aca  to  the 
— t — 

form:  W * UU  where  U is  a t x t upper  triangular  matrix.  Consider  the 
expanded  form  of  W for  c > 0: 

“ll  °12  Ult  I U11 

°22  •••  °2t  ! *2A  U12  U22 


U11 

U12 

CM 

CM 

:=> 

• 

- 

• 

• 

• 

• 

Ult 

U2t 

Utt 

a Vc 
\ 1 

CO 

Let  the  augmented  upper  triangular  matrix  be  represented  by  U,  then: 

— aaT 

W = UU 

Note  that  by  postraultiplying  U by  Givens  matrix  G,  and  premultiplying 

T ■ 

the  U by  G does  not  change  W,  i.e., 

/v  1>T 
W - UGG  U 

A 

The  proper  choice  of  G is  one  that  when  postmultiplied  to  U zeroizes  the 

(t,t+l)  element  of  U.  Repeated  application  of  Givens  matrices  can  elimi- 

T 

nate  the  last  column  of  U.  Also  the  repeated  premultiplying  of  U by  G 
would  result  in  elimination  of  the  last  row  of  U . 
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Case  2 


T T - T T 

Let  W = UU  be  perturbed  by  aca  such  that  W = UU  + aca  has 


a rank  one  less  than  that  of  W (c  is  a nonpositive  scalar).  It  is 

— T 

desired  to  modify  W = W + aca  to  the  form: 


W = C (UT|0)G1 

where  G is  an  t x t orthogonal  matrix  and  U here  is  a (t-1)  x (t-1) 
upper  triangular  matrix.  The  procedure  starts  as  in  the  previous  case, 

A 

even  though  the  last  column  of  the  augmented  matrix  U is  a complex 
vector.  The  first  phase  of  the  procedure  is  terminated  when  the  last 
column  of  augmented  matrix  U is  eliminated.  During  this  procedure  the 

A 

first  column  of  U also  vanishes,  i.e.,  at  the  conclusion  of  this  phase 
we  have: 


U11  U12  U13 


u2i  J22  tf23 


0 S32  °33 


l.t-1 


2, t-1 


03,t-l 


'll 

®21  ° 

0 

0 

12 

°22  °32 

0 

0 

13 

^23  ^33 

l,t-l 

^2, t-1  ^3, t-1 

• • * 

5t,t 

: o 


1°  0 °t,t-l  J 

The  second  phase  consists  of  premultiplying  the  left  matrix  U with  a 
sequence  of  Givens  matrices  such  that  the  element  directly  below  the 
diagonal  vanishes.  Also  postmultiplying  the  matrix  U with  a sequence 
of  Givens  matrices  will  yield  the  desired  factorization.  As  an  example 
of  this  case,  consider  the  following  problem. 
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Let  W 


Find  W 


0 

4 

5 


^ ^ T 

U1U1 


Phase  one  of  this  procedure  starts  by  selecting  Givens  matrices  such 

a ^ T 

that  the  (3,4)  position  of  U.  and  (4,3)  position  of  U-  vanishes. 


W = 


1 2 
0 4 
0 0 


3 

5 

6 


3i 

61 

4i 


/ 

1 

0 

0 

\ 

0 

/ 

1 

0 

0 

\ 

0 

0 

1 

0 

0 

0 

1 

0 

0 

0 

0 M 

0 

0 

3/5 

5 

2/5. 

T1 

0 

0 Mi 

3/5 

5 

0 

0 

3/5, 

—x) 

1 

2 

3 

3i 


0 

4 

5 


0 

0 

6 


6i  4i 


W 


4 

0 


3/5 

5 

3/$ 

5 

2/5 


3/5  ^ 


8/5, 

5 

0 


1 0 

2 4 

3/5  3/5 

5 5 

3/5.  8/5. 


0 

0 

2/5 


A /N'T 
= U2U2 


- T 


Next  we  eliminate  the  (2,4)  and  (4,2)  positions  of  Ug  and  » respec- 
tively. It  is  important  to  note  that  in  actual  computation  we  need  only 

A 

to  consider  U and  find  Givens  matrix  which  eliminates  selected  positions 

^ — — T 

in  U.  Then  after  determining  U,  its  transpose  U is  readily  available. 
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, 3 A 3A  ' 

12-j 5-1 

0 * f ^ 

0 0 2/5  0 


0 1 fi 


0 1/5  0 -21  0 /5  0 21 


3/5  3/5 

5 5 


0 21 


0 /s  J (0  -21  0 l&i  Sfi  0 


3/5  -1 

5 

3/5  0 

5 

2^5  0 


of  U3,  we  get 


4/5  3/5 


1 

0 

4/5 

4/5 

5 

5 

3/5 

3/5 

5 

5 

-i 

0 

e more  time 

/ 

4/5 

4/5 

5 

5 

3/5 

3/5 

5 

* 

5 

A A T 

U3U3 


Continuing  the  procedure  one  more  time  to  eliminate  the  (1»4)  position 


/\  /v  T 

= Va 


W = 4/5  3J5 
5 5 

0 2/5 


The  second  phase  of  this  procedure  is  used  to  transform 
W - U^U^T  to  W = GTUUTG.  this  phase  we  use  Givens  matrices  which  place 
zero  at  the  (2,1)  and  (3,2)  positions  of  U^.  Eliminating  the  (2,1) 
element  of  the  matrix: 
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/2  W2 

2 2 

/2  Jl 


k0  0 


Jl  W2 
2 2 


2 

2 

Ji 

2 

7i 

2 

.0 

0 

y 

Ji 

2 

-12 

2 

Ji 

2 

ft 

2 

o 

✓ 

0 

T** —T 

gJuitg 


otice  in  this  particular  example  that,  while  attempting  to  reduce 

A 

(2,1)  element  of  to  zero,  the  (2,2)  element  also  vanishes, 
refore  in  order  to  transform  W to  the  required  factorization,  we 
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Case  3 


Here  we  modify 


W = (u\ 

\0  / 


T T 

(U,0)  by  a dyad  bcb  where  b is  a 


n x 1 vector,  c is  a scalar  and  t < n . It  is  desired  to  transform 


W - /U\(u?0)  + 

Vo/ 


bcb  to  the  form  of 


T I TI 1 T 

W - H I^J  (U  |0)H 


where  H is  an  n x n orthogonal  matrix  and  U is  an  upper  triangular 
matrix  of  rank  t + 1. 


UU  ”12 


' ' "it  bl^  ] f°ll 


U22  • ' ' U2t  b2^  U12  "22 


°tt  bt^  Ult  °2t  *"  ”ct 


b.i/c  b-r'c...  b^nt  b.  ...  b >/c 


Let  W = ZZ  and  let  H be  a Householder  matrix  which  can  reduce 
the  (t+2)th  through  the  nth  position  of  the  last  column  of  Z to  zero,  i.e., 

HZ  - (£)  • 

_T  — T 

Due  to  the  symmetry  of  H,  Z H = (U  10).  Then  the  desired  factorization 


is  given: 


— T — T T 

W - HHZZ  H H 


- HT(^)(UT|0)H 
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4.  Estimating  the  Positive  Part  of  the  Hessian  Matrix 
and  Directions  of  Nonpositive  Curvature 

In  this  section  we  will  give  a general  algorithm  for  estimating  the 
positive  part  of  the  Hessian  matrix  and  as  a byproduct  direction  of  non- 
positive curvature.  The  Hessian  matrix  is  assumed  in  outer  product  or  dyadic 
form,  i.e. 


V2f(xk.)  = F(k) 


® k k / k\' 
I a.c.  la. | 

J-l  3 J \ V 


First  a few  preliminary  results  are  required.  In  the  following,  the 
matrix  A will  be  assumed  positive  semi-definite,  a is  an  n by  1 vector, 
e is  n by  1,  and  c is  a scalar. 


If  Ae  = 0 , then 


[A+Azaz  A]e  = 0 


(4.1) 


for  all  vectors  z , all  scalars  a . 

If  aTA+  ^ 0 , and  [I  - AA+]a  = 0 , then 

[A+a(-aTA+a)_1aT]A+a  = 0 . (4.2) 

A necessary  and  sufficient  condition  that  [I-AA+]a  = 0 is  that 
T 

a e = 0 for  all  vectors  e where  Ae  = 0 . (4.3) 

T 

A necessary  condition  for  A + aca  to  be  positive  semi-definite  is 


T + 

1 + ca  A a > 0 . 


If  [I-AA  ]a  = 0 , then  (4.4)  is  a sufficient  condition  for 


A + aca  to  be  positive  semi-definite. 


(4.4) 


(4.5) 


For  simplicity  of  notation,  the  superscript  (k)  is  omitted  from  the 
general  description  of  the  algorithm.  Let  P denote  the  estimate  of  the 
positive  part  of  F , and  P*  denote  its  generalized  inverse  of  the  beginning 

of  iteration  j . Assume  the  dyads  are  relabelled  so  that 
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R - {l,...,r>  and  S - {r+l,...,s}  , with  c^  > 0 , jeR  , c^  < 0,  JeS  . 
Then 

„ r r T r T 

) a,  c.a.  +•  ) a.c.a. 

L j J J L 


] ] 3 


3-1  J J J j=r+l 
The  dyads  with  positive  scalars  are  treated  first,  and  then  as  many 
of  the  dyads  with  negative  scalars  are  "absorbed"  until  further  absorption 
is  impossible. 

Set  Px  - 0 , j = 0 . 

A.  Set  j = j + 1 


If  j > r go  to  B . 


Set  Pj+1  Pj  + ajCjaj  * 
Compute  P*+1  (see  Case (2(b))  ) 


Return  to  A . 


B.  If  j > s , go  to  C . 

If  [I-PjP+la  f 0 , set  P = P 
go  to  B . 


(4.6) 


Set  j = j+1 


If  1 1- P j ? a j = ® ’ calculate  Kj  = 1 + cjajPjaj  • 

If  K > 0 , set  P.^.  = P.  + a.c.aT  . Set  S = S - {1}  . 
j ~ 3+1  J 3 J J 


Compute  Pj+i  (see  (case (la)  or  a(b)) 
go  to  B . 

If  Kj  < 0 , set  j = j + 1 , go  to  B 


Set  j = j + 1 , 

(4.7) 

(For  a possible 


partial  absorption  of  the  dyad  see  the  discussion  in 
Section  8). 

C.  Now  Pg+^  is  an  estimate  of  the  positive  part  of  F . It 
is  a positive  semi-definite  matrix.  If  it  is  not  positive 
definite,  directions  of  nonpositive  or  negative  curvature 
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are  of  the  form 


[1-ps+l  Ps+l] 


z for  any  z such  that 


there  exists  an  i(z)  0 S where  a. 


[I-Ps+lPs«] 


z f 0 . 


To  see  this. 


*TK+iCi] p t-ViCi]* 

■ '’KAj  [w  ^ viaI]  [i-ps+ips+i]i 

i [zI[I-ps+lpr+l]*i(2,]2  -Hz)  < ° • 


(4.8) 


If  jeS  at  the  end  of  the  iterations  it  needs  to  be  shown  that 


IK+iCih  * ° 


(4.9) 


(which  then  implies  that  P++^  aj  is  a direction  of  negative  curvature). 

There  are  two  possibilities  for  j being  in  S . First, 
fl-PjP jla^  ^ 0 . By  (4.3)  there  is  an  e such  that  Pje  = 0 and  a^e  ^ 0 . 


jl-PjPjJa^  ^ 0 . By  (4.3)  there  is  an  e such  that  P^e  = 0 and  a% 
The  only  updating  to  P^  from  iteration  to  iteration  is  when  jl-P^P^J 


ak  = ° 


which  implies  that  if  P^+^  f P^  , P^+^  = P^  + P^z  y z P^  . From  (4.1)  it 


follows  by  induction  that  ^s+^e  = 0 • 


I-Ps+lPs+l] 


*j  * 0 


The  second  case  is  when 


|l-P..pjJa..  =0  and  < 0 . Now  by  (4.2), 


Pj+^Pja^  ■ 0 . Also,  a^  P^V  ^ 0 • Therefore  by  (4.3), 


[I_pj+ipj+i] 


*j  * 0 


The  same  argument  as  above  yields  (4.9). 

In  the  specific  realization  of  this  algorithm  given  next,  directions 
of  nonpositive  curvature  and  negative  curvature  will  be  more  clearly  defined. 

Now  we  present  a numerically  stable  method  for  computing  the  generalized 
inverse  of  approximate  positive  part  of  the  Hessian  matrix.  This  method  does 
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I 

I 

\ 


the  linear  algebra  required  by  steps  (4.6)  and  (4.7)  of  the  general 
algorithm  just  presented.  At  each  iteration  j (which  is  suppressed  for 
notational  simplicity)  P is  assumed  to  be  of  the  form 

p - QT  (3)  (UT|0)Q 

where  Q is  an  n by  n orthogonal  matrix,  and  U is  a t x t upper 
triangular  matrix  of  full  rank.  The  generalized  inverse  of  P is  therefore 

p+  “ qT  (*r~)  (u_1l°)Q  • 

There  is  never  any  reason  to  compute  this  explicitly. 

Initially  PQ  has  a Q equal  to  the  identity  matrix,  and  U has 

rank  zero.  The  following  cases  for  the  updating  procedure  are  in  slightly 
different  order  than  presented  in  the  preceding  general  algorithm.  The 
appropriate  correspondences  are  indicated. 

Algorithm 

Assume  P • QT  (UT|0)Q  is  known  where  Q is  an  n x n orthogonal 
matrix  and  U is  t x t full  rank  upper  triangular  matrix.  It  is 

T* 

desired  to  find  whether  P * P + aca  is  positive  semidefinite  for 
some  n x 1 vector  a and  scalar  c ; and  when  P is  positive 
semidefinite  to  compute  the  QTU  factorization  of  P . 

QTU  Algorithm 
Compute  the  vector 

(|).  K 

where  w is  t x 1 and  8 is  (n-t)  x 1 are  real  or  complex  vectors 
depending  on  whether  c is  positive  or  negative.  There  are  two  cases 
to  be  considered. 
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Case  1 


The  vector  g * ^ . This  implies  (I-PP+)a  = 0 since 

Q(I-PP+  )a»^c  = Qa/Z  = QQT  (J)  (UT|0)QQT  | (U_1|0)Qa»^ 


Qa/c 


■ \ 0 0 


Qa/c 


(4.10) 


/oj\  |u\ 

U/ " lor 


It  can  be  seen  that  when  g = 0 , (4.10)  is  equal  to  zero.  Note  that  this 
case  also  embodies  the  situation  when  U is  an  n x n upper  triangular 
matrix,  i.e. , t = n . 

T + 

Next,  compute  K = 1 + ca  p a . There  are  three  possibilities  which 
need  to  be  considered. 

— T 

a)  If  K > -0  , then  from  (4.5),  it  follows  that  ? = P + aca 
is  positive  semidefinite,  and  the  rank  of  P is  equal  to 
t . (This  corresponds  to  (4.7)  when  Kj  > 0)  . 


— T 

p - Q I* 


+ aca 


(«} <uT|,i)Q 
* qI  (f)  ^ (f)  <“>  0 

■ + If)  • 


(4.11) 


Then  using  Givens  transformation  (see  Section  2)  we  can  transform 
(4.11)  to 


- 

P - Q 


(!)  «T|«>q 


(where  U is  a t x t upper  triangular  matrix) , 
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b)  If  K = 0 , from  (4.1),  (4,2)  and  (4.5)  it  follows  that 
» x 

P * P + aca  is  positive  semidefinite  and  the  rank  of 

p is  t - 1 . The  rank  reduction  occurs  because  a 
positive  eigenvalue  now  becomes  zero.  Givens  transforma- 
tions could  be  used  to  convert  (4.11)  to 


u 


(I) 


(UT|°)  jj-4 


where  G is  a t x t orthogonal  matrix  and  U is  a 
(t-1)  x (t-1)  upper  triangular  matrix. 


— T 

Let  Q 


i.e. 


(U*! 


0)Q 


(This  corresponds  to  (4.9)  when 


0 .) 


c)  When  K < 0 the  matrix  p is  not  positive  semidefinite. 

No  modification  is  made  although  as  discussed  in  Section  8, 
part  of  the  dyad  could  be  absorbed. 


Case  2 


In  this  case  vector  8 is  not  a zero  vector,  (i.e.  [I-PP  ]a  / 0 . 

a.  If  c < 0 then  P is  not  positive  semidefinite  and, 
consequently,  there  is  no  need  to  update.  (This  corresponds 
to  the  first  line  following  8 .) 

b.  If  c > 0 , p is  positive  semidefinite;  then 


+ aca 


Q1  $ (VT|VQ 

qT  ($)  <uT^)Q  + qT  (f)  (a)TipT)Q 


(4.12) 
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To  convert  (4.12)  to  the  proper  form  a Householder 
transformation  can  be  used  to  make  the  following 
transformation. 


H 


U loo 
0 8 


where  r is  a scalar. 
Letting  Q = HQ  and  U 


— T 

P * Q 


(I)  (5l|«« 


0 

r 

L* 

*J 

'u 

“fl 

lo 

rj 

u 

is 

, then 


where  U is  a (t+1)  by  (t+1)  upper 


triangular  matrix.  It  is  apparent  that  whenever  8 = , 


where  r is  scalar,  then  H is  equal  to  the  identity 

matrix.  (This  corresponds  to  (4.6)). 

After  the  Hessian  matrix  has  been  processed  according  to  the  two 
previously  defined  algorithms  the  situation  is 


F 


IS) 


(ut|o)q 


+ ^jes  ajcjaj 


The  matrix  to  the  right  of  the  equals  sign  is  an  estimate  of  the  positive 
part  of  F . If  S = 0 , and  if  U is  n by  n , then  F is  positive 
definite.  If  S * 0 and  U is  t by  t with  t < n , F is  positive 
semidefinite  singular  and  any  of  the  last  (n-t)  rows  of  Q are  directions 
of  zero  curvature. 


When  S ^ 0 , any  of  the  last  (n-t)  rows  of  Q is  a direction  of 
nonpositive  curvature,  and  at  least  one  of  them  is  a direction  of  negative 
curvature.  To  see  this,  note  that 


nvli'*)  ’ q2Vj 


where  Q2  is  the  matrix  consisting  of  the  last  (n-t)  rows  of  Q , and 

T 

is  any  vector  with  jeS  . From  (4.9),  it  follows  that  a^  is  not 
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This  row  is  therefore 


5.  Modified  Newton  Strategies 

In  this  section  we  outline  three  modified  Newton  strategies.  We 
use  the  QTU  factorization  for  eavaluating  the  approximation  to  the  positive 
part  of  the  Hessian  matrix  and  directions  of  nonpositive  curvature.  In 
Section  7 these  strategies  are  evaluated. 


Let  Xq  be  a given  initial  point;  then  at  the  kth  iteration  x^  is 
known  and  x^+^  needs  to  be  determined. 

Strategy  1 (S1MNM) 


If  the  Hessian  matrix  at  x^  is  positive  definite,  set 
2 -1 

s^  = -V  f(x^)  Vf(x^)  , and  find  x^^  using  the  optimal  step  size 

procedure  (OSSP)  (see  (5.1)  ) . If  it  is  not  positive  definite,  select 
d^  , a nonascent  direction  of  nonpositive  curvature.  Set  s^  * d^  and 
find  x^+^  using  OSSP.  There  are  many  different  ways  d^  can  be  chosen 

and  these  are  discussed  later  in  this  section. 

2 

The  motivation  for  this  algorithm  is  that  when  V f (x^)  is  not 


positive  definite  moving  along  a direction  of  nonpositive  curvature  will 
tend  to  move  the  sequence  of  points  into  a region  where  the  Hessian  is 
positive  definite  so  that  the  usual  Newton  move  will  apply.  Even  when  the 
Hessian  matrix  is  positive  definite,  it  is  deemed  advantageous  to  use  the 
optimal  step  size  procedure  instead  of  using  a step  of  size  of  one  as 
prescribed  by  the  classical  method. 


Strategy  2 (S2MNM) 

This  is  the  same  as  S1MNM  except  that  following  the  move  along  d^ 
in  the  nonpositive  definite  case  another  move  is  made  in  the  direction 


< Vf(V 


Formally,  let  y. 


*k  + <^ktk  w^ere 


solves  OSSP. 
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Then  find  solving  min  f [y^-P^Vf (x^t]  • Set  x^+1  = y^-P^Vf (x^)"^  • 

This  extra  step  in  the  iteration  is  an  attempt  to  minimize  some  portion 
of  the  function  even  though  it  doesn't  act  like  a strictly  convex  function 
based  on  the  information  at  x^  . 

Strategy  3 (S3MNM) 

Let  0 < a < 1 be  a given  constant.  If  the  Hessian  matrix  at 

2 —l 

x^  is  positive  definite,  set  s^  = -V  f(x^)  Vf(x^)  » ancl  d^  = 0 . 

Find  using  the  Second  Order  Armijo  Step  Size  procedure  given  below. 

If  the  Hessian  is  not  positive  definite,  set  s^  = -p+  Vf(x^)  , select  d^ 

a direction  of  nonpositive  curvature,  and  use  the  Second  Order  Armijo  Step 
Size  procedure  (5.2)  to  find  x^+^  • This  can  be  thought  of  as  a way  of 

combining  a move  which  attempts  directly  to  minimize  f with  a move  which 
attempts  to  get  the  sequence  in  a region  where  the  Hessian  matrix  is 
positive  definite. 

The  Optimal  Step  Size  Procedure  (OSSP) 


Given  s^  , a nonascent  direction,  set 

\+i  = *k  + s’ c 


k k 


(5.1) 


where  t,  solves 
k 


min  f[x,+s  t] 
t>0  K K 


Second  Order  Armijo  Step  Size  Procedure 

Let  a be  a given  constant  where  0 < a < 1 . Assume  at  x^  are 
given  s^  , a nonascent  direction,  and  d^  a nonascent  direction  of  non- 
positive curvature.  Define 

yk(i)  - \ + + dk2‘i/2  . 

Find  i(k),  the  smallest  integer  from  i * 0,1,...  such  that 
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f[yk(i)3  - f^)  < a2"i[s^f(xk)  + \ d£  V2f(xk)dk]  . (5.2) 

Set  *k+i " \[1(k)]  • 

The  above  three  strategies  all  require  a direction  of  nonpositive 
curvature.  The  QTU  algorithm  produces  some  as  the  last  (n-t)  rows  of  Q . 

For  any  ieS  (after  all  possible  dyads  have  been  absorbed) , any  +ps+^a^ 

also  a direction  of  negative  curvature.  Several  strategies  for  selecting 
from  these  were  evaluated.  Let  d , . . . ,d^  be  l available  nonascent 
directions  of  nonpositive  curvature. 

_ 0 4 

DNPC1:  Set  d = £ d /l  (the  average  of  the  possibilities). 
i=l 

DNPC2:  Set  d = +P  a_  where  a_  minimizes  1 + c.aTp^V.a.  for  ieS  . 

— s+l  II  i i s+1  i 

If  S is  empty,  DNPC3  is  used. 

DNPC3:  Set  d = d*  where  d^"  minimizes  (d^)^Fd^  for  i=l,...,£  . 

DNPC4:  Set  d = d1  where  d1  minimizes  (di)TVf(xk)  for  i=l . 

6.  Statement  of  Problems  Solved 

In  this  section  is  given  the  statement  of  the  problems  solved  by  the 
modified  Newton  methods.  In  Section  7 is  given  the  work  required  to  solve 
them  using  the  different  options.  Some  of  the  problems  are  constrained  and 
the  SUMT  method  for  solving  constrained  problems  using  a sequence  of 
unconstrained  problems  (to  which  the  methodology  applies)  is  briefly  presented. 

Consider  the  general  nonlinear  programming  problem  in  its  canonical 

form.  Find  X£Rn  by  solving 

minimize  f (X) 

subject  to 

gl  (X)  > 0 i=l,...,m 

hj  (X)  = 0 j=l P . 
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Among  the  several  existing  classes  of  techniques  for  solving  the  general 
nonlinear  programming  problem  are  methods  which  transform  a constrained 
problem  Into  a sequence  of  unconstrained  problems.  Here  we  only  briefly 
discuss  the  method  of  Fiacco  and  McCormick  (1968)  which  was  used  in  this 
paper  to  solve  a general  nonlinear  programming  problem.  The  proposed 
transformed  function  which  is  known  as  the  penalty  function  is  defined  as: 

P(X,r)  = f(X)-r  l In  g.  (X)  + £ hx  (X)/r 
i-1  1 j-1  X 

If  {r.  } is  an  infinite  sequence  of  points  such  that  lim  r,  = 0 
k k-*» 

then,  loosely  stated,  there  exists  a point  for  each  k that  minimizes 

(k)  (k) 

p(X,r^),  X ; every  limit  point  of  X is  an  optimal  solution  of  the 

original  constrained  problem.  The  reader  is  referred  to  Fiacco  and 
McCormick  for  a comprehensive  statement  and  proof  of  the  convergence 
theorems . 

Here,  following  Fiacco  and  McCormick  we  summarize  the  general 
iteration  of  this  algorithm.  Let  R®  = (X ) ; g(X)>0,  i=l,...,m}  . 

i.  Find  a point  X^  in  . 

ii.  Select  the  initial  value  of  r , r^  . The  selection 

of  r^  may  be  arbitrary  or  based  on  any  one  of  the 

criteria  described  in  Section  7.1  of  Fiacco  and 
McCormick  (1968). 

iii.  Find  an  unconstrained  minimum  of  P(X,r)  for  the 

current  value  of  r^  . The  method  used  in  this  paper 

to  solve  this  unconstrained  problem  is  the  Modified 
Newton  method  described  in  Section  5. 

iv.  Estimate  the  solution  of  the  original  constrained 
problem  by  extrapolation. 

v.  Stop  if  the  current  solution  satisfies  the  convergence 

criteria,  otherwise  select  r.  , , < r,  and  continue 

k+1  k 

the  procedure  from  step  iii. 
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We  now  present  the  solution  of  several  test  problems  which  were 
solved  by  the  Modified  Newton  method  described  in  Section  5.  The  pur- 
pose of  presenting  these  problems  and  their  solutions  in  this  section  is 
to  establish  the  credibility  of  the  QTU  algorithm  since  the  solutions  to 
these  problems  were  obtained  by  other  methods  also.  As  we  discussed  pre- 
viously, the  solution  procedure  for  general  nonlinear  programming  problems 
which  is  used  in  this  paper  is  based  on  the  solution  of  a sequence  of 
unconstrained  functions.  The  computer  program,  which  was  coded  and  used 
here,  requires  the  user  to  transform  the  unconstrained  problem  by 
explicitly  defining  the  penalty  function  (P(X,r))  as  the  objective  function. 

Shell  Dual 

This  problem  was  formulated  by  the  Shell  Development  Company  for 

the  purpose  of  testing  nonlinear  programming  routines. 

Minimize  : 

<X,Y) 


subject  to: 

5 5 10 

8i  5 ei  + 2 l (I  *k1V  + 3d  Xj  - l a Y > o for  i-l,...,5 

k=i  K 11  j=i  J1  3 

%+5  = Xi  2 o for  i=l , . . . ,5 

= ^ j — ° , ■ • • , 10 

The  values  of  a±J , b^  d^  eA  and  F are  given  in  Table  2.  The 
solution  to  this  problem  is: 

X1  = .3000021,  x2  = .3334455,  x3  = .4000091 
X4  **  .4283571,  x5  = .2240632,  = .1683284E-05 

X?  = . 3128900E-04,  Xg  - .5174274E  01,  xg  = .4377716E-04 

X1(J-  . 306 109 8E  01,  YL  - .1184054E  02,  Y2  - .1594590E-05 

Y3  - .1076526E-05,  Y^  - .1043444,  . .8907809E-04 
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Gasoline  Demand  Function 


A nonlinear  dynamic  demand  function  was  formulated  and  its 
parameters  were  estimated  using  the  technique  of  least  squares.  Let 


q = the  demand  for  gasoline  per  capita  in  ith  census  region  in  jth 

i » J 

1 

year. 


P = average  price  in  census  region  i for  year  j 
i * J 


D ■ disposable  personal  income  per  capita  for  region 
i * J 

.3 


in  year  j 


Mj  ■ average  miles  per  gallon  consumed  per  passenger  car 


For  this  study  we  considered  3 census  regions  over  a 9 year  period,  i.e. 
i-1,2,3,  and  j«l,2, . . . ,9.  The  demand  function  considered  was: 

qi.j  “ qi» j-1  6Xp[  _XlPi,j  + X2Pi,j-l  + X3°i,j  " X4°i, j-1  + V 


-X6MJ  Xi+7 


Table  3 provides  the  data  which  was  used  in  this  model. 

The  nonlinear  least  squares  problem  which  was  solved  is: 


= \ f {9.  < _q  * * i exP  f“xi 

i-1  j=2 


P. . -X.  P.  , . 
ij  2 x,j-l 


+ X3  Di,j  ' X4  Difj-1  + X5]  VP  - Xi+7>2 


The  solution  that  was  obtained  for  this  model  is: 


Xx  - .6668957E-02,  X2  = .5533945E-02,  X3  = .1442763E-01 


X4  - .1707499E-01,  X5  = . 8533931E-01,  X&  = .1050872E  04 

X,  - - . 2894233E-03,  XQ  = .2003835E  02,  Xrt  = .2350300E  02 
/ o 9 


X10-  .2460 7 30E  02 


1.  Source:  American  Petroleum  Institute 

2.  Source:  Platt’s  OILGRAM  price  service 

3.  Source:  U.S.  Department  of  Commerce,  Bureau  of  Economic  Analysis 

4.  Source:  U.S.  Highway  Statistics 
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TABLE  3 

DATA  FOR  GASOLINE  MODEL 


(Consumption/Capita)  in 

Gallons 

1 

2 

3 

391.98 

342.33 

382.45 

412.39 

361.13 

403.84 

437.68 

374.31 

427.27 

455.86 

395.28 

447.19 

480.13 

406.07 

468.80 

511.38 

424.49 

479.68 

533.24 

433.06 

523.73 

516.65 

423.21 

504.57 

527.57 

- 

428.98 

510.36 

(Disposable  Consumption/ 
Capita  x lOO) in  Dollars 


year 

1967 

1968 

1969 

1970 

1971 

1972 

1973 

1974 

1975 


20.65 

21.4656 

21.9004 

22.3686 

23.2121 

24.4494 

25.8538 

25.3773 

24.8000 


29.83 

30.60 
30.86 
31.50 
31.79 
32.36 
33.44 
32.85 

32.60 


24.34 

25.24 

25.83 

26.56 

27.46 

28.86 

30.41 

29.69 

28.90 


| 

Price,  Cents /Gallons ; In 
Constant  Dollar  Adjusted  by 

CPI  (1967  = 100) 

Average 

Miles/ 

Gallon 

Consumer 

Price 

Index 

1967 

33.08 

33.07 

32.68 

14.05 

100 

1968 

32.46 

33.00 

32.71 

13.91 

104.2 

1969 

31.61 

32.77 

32.50 

13.75 

109.8 

1970 

30.72 

31.77 

31.58 

13.7 

116.3 

1971 

30.12 

31.39 

31.07 

13.73 

121.3 

1972 

28.83 

29.94 

30.28 

13.67 

125.3 

1973 

29.08 

30.09 

30.11 

13.29 

133.1 

1974 

35.37 

36.70 

37.44 

13.71 

147.1 

1975 

34.53 

35.27 

35.43 

13.81 

161.2 
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Multi-item  Continuous  Review  Inventory  Model 

Schrady  and  Choe  (1971)  formulated  a multi-item,  continuous 

review  inventory  model  with  back  orders.  The  formulation  involved 

the  minimization  of  total  time-weighted  units  short  per  unit  time  when 

the  lead  time  demand  for  the  ith  item  was  normally  distributed  with 

2 

mean  y and  variance  o^.  The  constraints  which  were  considered  are: 

(i)  Total  average  investment  cost,  less  than  or  equal  to  an 
investment  limit. 

(ii)  The  total  number  of  orders  per  unit  time  less  than  a 
specified  amount. 

Let  for  the  itlt  item: 

■ item  cost 

A^  = mean  demand  per  unit  time 

•Kr^,)  = probability  that  the  lead  time  demand  is  greater 

than 

r^  = reorder  point 

= reorder  quantity 
= investment  limit 

■ reorder  limit 

N *>  total  number  of  item s in  inventory 

They  derive  the  following  mathematical  programming  problem 

N . . 

Minimize  f = £ i'TV 

Q,r  i=l  Q. 


subject  to 


’>1  ~ ki  - Ci  (ri  + / ~ »i>  i °» 
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e •=  u _ V > 0 

87  - k2  L o — u* 

* * i=l  Wi 


s2+i  = Qi>0  for  i = 1 N 


where  8 


ri)  = 2 [oi  + (ri  ' V*  * (_^  ' T (ri  ' Pi>  *<-^and 


.2  °i 


V^jU 


-x2/2 

1 e 2 rx 

♦ (x)  = and  <t(r)  = j 


<j>(x)dx 


TABLE  4 

DATA  FOR  INVENTORY  MODEL 


N=3 

Item  1 

Item  2 

Item  3 

Xi 

1000 

1500 

2000 

Ci 

1 

10 

20 

vi 

100 

200 

300 

o. 

100 

100 

200 

i 

with  k^  = 8000  and  = 15 

The  problem  was  solved  using  data  given  in  Table  4 and  the  results  are: 


5331615E 

03, 

Q2  = .2455901E  03, 

Q3  = . 2850370E 

03 

2527785E 

03, 

r2  = .2770080E  03, 

r3  = .4366119E 

03 
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Consumer  Expenditure  Model 

Friedman  and  Meiselman  (1963)  estimated  consumer  expenditure  based 
on  the  money  stock  using  a simple  model 

Ct  = a + SMt  + et  (6.1) 

where  Ct  is  the  consumer  expenditure  for  year  t,  Mt  is  its  corresponding 

money  stock  and  follows  a first  order  autoregressive  scheme,  i.e., 

e£  “ Ufc  + P£t_^  where  p is  an  unknown  parameter  with  0 < p < 1 and  U 

2 

is  distributed  normally  with  mean  0 and  variance  a . The  logarithmic 
likelihood  function  is  given  by  (5.2).) 

L = log  (2r)  - log  o*  - \ [(C.  - pC  .)  - a(l-p) 

L U 2<T  t=2 

u 

- B(Xt  - px^)]2  (6 -2) 

A heuristic  solution  to  this  nonlinear  optimization  problem  was  obtained 
by  Kmenta  (1971).  Table  5 provides  the  necessary  data  for  estimating  the 
parameters  of  6.2.  The  problem  which  we  solved  in  its  canonical  form  is: 

1 n ' 
Minimize:  f = (n-1)  logt  + ~ I [ (ct  “ PC^)  “ a(l-p)  - g(Xt  - pX^}  ] 

Subject  to  g^  = p + .9999  >_  0 

*2  = T 1 0 

g3  = .9999  - p > 0 

where  T = a2 

Our  algorithm,  for  varying  starting  values  of  r of  the  penalty 
function,  yielded  two  local  minima.  The  solution  which  resulted  in  the 
lowest  value  in  objective  function  was  not  discovered  by  Kmenta. 
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TABLE  5 

DATA  FOR  CONSUMER  EXPENDITURE  MODEL 


Year  and 
Quarter 

Consumer 

Expenditure 

Money 

Stock 

Year  and 
Quarter 

Consumer 

Expenditure 

Money 

Stock* 

1952  I 

214.6 

159.3 

1954 

III 

238.7 

173.9 

II 

217.7 

161.2 

IV 

243.2 

176.1 

III 

219.6 

162.8 

IV 

227.2 

164.4 

1955 

I 

249.4 

178.0 

II 

254.3 

179.1 

1953  I 

230.9 

165. 9 

III 

260.9 

180.2 

II 

233.3 

167.9 

IV 

263.3 

181.2 

III 

234.1 

168.3 

IV 

232.3 

169.7 

1956 

I 

265.6 

181.6 

II 

268.2 

182.5 

1954  I 

233.7 

170.5 

III 

270.4 

183.3 

II 

236.5' 

171.6 

IV 

275.6 

184.3 

The  solution  which  resulted  in  the  lowest  value  in  objective  function  is 
0*  = 4.372803,  p = .9999,  a = .1896194E  05,  8 =*  1.004103 

The  other  local  rainimizer  is 

a2  = 4.4638454,  p = .8240537,  a = -.2354883E  03,  6 = 2.753057 

u 

Even  though  there  is  not  a substantial  difference  between  the 
optimal  value  of  the  objective  function  for  these  two  solutions,  the 
economical  interpretation  of  these  estimators  is  no  longer 
identical. 


Source:  Milton  Friedman  and  David  Meiselman  (1963) 
*In  billions  of  dollars 
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The  solution  for  this  problem  was  facilitated  by  selecting  a 
very  small  value  for  r^  (r^  = 1.0E  - 07)  . 

Criteria  for  Evaluation  of  Modified  Newton's  Methods 

In  Section  5 three  possible  strategies  of  modified  Newton  methods 
(S1MNM,  S2MNM,  S3MNM)  were  presented.  In  this  section  we  attempt  to 
study  the  computational  characteristics  of  each  of  these  methods  when 
the  inverse  of  the  Hessian  matrix  is  obtained  via  the  QTU  algorithm. 

Here  we  determine  if  any  one  of  these  three  strategies  is  superior  over 
the  remaining  two. 

The  line  search  used  in  S1MNM  and  S2MNM  is  the  "exact"  line 
search  OSSP  while  in  S1MNM  we  use  McCormick's  generalization  of  Armijo's 
method. 

Another  pending  question  which  we  attempt  to  answer  is  which 
one  of  the  four  strategies  for  selection  of  descent  direction  of  non- 
positive curvature  that  was  discussed  in  Section  5 is  most  appropriate 
to  be  used  with  modified  Newton  methods.  In  order  to  measure  the 


computational  efficiency  of  each  method  we  need  to  specify  several 
criteria  for  selection  of  the  best  algorithm.  The  most  important 
criteria  for  a nonlinear  programming  algorithm  is  its  reliability  in 
obtaining  a correct  solution.  For  the  purpose  of  this  experiment  we 
selected  several  well-known  test  problems  with  known  solutions. 

Since  all  three  strategies  of  modified  Newton  methods  have 
basically  the  same  procedure  when  the  Hessian  matrix  is  positive 
definite,  one  would  be  interested  to  examine  these  methods  only  for 
points  where  the  Hessian  matrix  is  nonpositive.  Therefore,  it  is 
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important  to  count  the  number  of  moves  along  the  direction  of  nonpositive 
curvature  during  the  solution  procedure. 

The  next  important  criteria  is  the  number  of  functions, 
gradient  evaluation  when  a point  with  nonpositive  Hessian  matrix  is 
encountered.  Another  criteria  which  we  considered  is  the 
CPU  time  spent  for  each  problem.  The  time  which  was  measured  is  the 
actual  time  used  rather  than  merely  the  elapsed  time  from  the  start  of 
the  execution  to  termination.  The  former  is  more  desirable 
since  it  does  not  depend  on  CPU  utilization. 

A note  of  caution  is  in  order  to  warn  the  reader  that  these 
criteria  are  only  used  to  evaluate  a portion  of  the  optimization 
rather  than  the  overall  performance  of  this  code.  The  following 
functions  were  used  to  carry  out  the  experiments.  The  alphabetic 
letters  next  to  each  test  problem  will  be  used  to  identify  the  test 
problems. 


(A)  Shell  Dual 

(B)  Gasoline  Demand  Function 

(C)  Inventory  Model 

(D)  Consumer  Expenditure  Model 

(E)  Wood's  Function: 

minimize:  f(X)  = 100  (X2  - X2)2  + (1  - X^2  + 90  (X4  - X2)2 
+(i-x3)2  + 10.1[(X2-1)2  + (x4  -1>2] 

+19.8(X2-1)(X4-1) 

; i 

The  starting  point  is  (-3,-1, -3,-1) 


: 


i 


I 
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(F)  Rosenbrock's  Problem 


minimize:  f(X)  ± (1-X^2  + 100(X2  - X2)2 
The  starting  point  is  (-1.2,  1.0) 

(G)  Powell's  Function  I 

minimize:  f(X)  = (Xx  + 10X2) 2 + 5(X3  - X4>2  + (X2  - 2X.j)4  + 10(Xj-X4)4 
The  starting  point  is  (3. 0,-1. 0,0. 0.1.0) 

(H)  Powell's  Badly  Scaled  Function 

minimize:  f(X)  = (104X  X2  - l)2  + (e"Xl  + e"X2  -1.0001)2 
The  starting  point  is  (0.0,  1.0) 

(I)  Four  Cluster  Function 

minimize:  f(X)  = [(X3  - X2)  (X3  - sinX2)]2  + [(cos(X2)  - X^ 

(X2  - cos  (Xx))]2 
The  starting  point  is  (0.0,  0.0) 

(J)  Brown's  Function  with  Two  Global  Minima 
minimize:  f(X)  = (X2  - X£  - l)2  + ((Xx  - X2)2  + (X2  - 0.5)2  -l)2 
The  starting  point  is  (0.1, 2.0) 

(K)  Penalty  Function  I 

4 4 

minimize:  f(X)  = 10_5  £ (X.  - l)2  + [ l X?  - \]2 

i-1  1 i=l  1 

The  starting  point  is  (1. ,2. ,3. ,4.) 

(L)  Box's  Function 

minimize:  f(X)  = \ {e  xl^i  - e x2^i  - X,  (e  ^ - e *^*)}2 

i=l  i 

The  starting  point  is:  (0,20,20) 
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(M)  Gottfried's  Function 

minimize:  f(X)  = (Xx  - .1136(X1  + 3X2)  (1  - X^)2  + (X2  + 
7.5  (2XX  - X2)  (1  - X2))2 

The  starting  point  is:  (0.5, 0.5) 

(N)  Hyperbola-Circle  Function 
minimize:  f(X)  = (X^  - l)2  + (X2  + X2  - 4) 2 


The  starting  point  is:  (0.,1.0) 

(O)  Brown's  Badly  Scaled  Problem 

minimize:  f(X)  = (Xx  - 106)2  + (X2  - 2xl0-6)2  + (X^  - 2) 2 

The  starting  point  is:  (1.0,  1.0) 

(P)  Cubic  Function 

minimize:  f(X)  H X®  + 2 (•  1X2(X£  - l))2  + (X£  - l)8  + 2(.1X2(X3  - l)2)2 

+(X3  - 1)8+  2 

The  starting  point  is:  (2.0, -3.0 ,3.0) 

(Q)  Rosenbrock  Cliff 


f(X)  s ]2  - (Xx  - X2)  + e20(Xi  ■ V 


The  starting  point  is:  (0.0, -1.0) 


Experimental  Procedure 

In  Appendix  C of  Emaml  (1978)  a complete  listing  of  each  test 
problem  is  given  when  the  solution  procedure  is  based  on  S1MNM  and  direc- 
tions of  nonpositive  curvatures  are  evaluated  according  to  DNPC1  strategy. 
The  convergence  criteria  which  was  selected  is 
| V'IP(x|1^r)  [V2P(X,(i)r)]"1VP(X,(i)r)|  < e 
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where  P(X^  ^r)  is  the  value  of  the  penalty  function  at  point 
for  parameter  r.  The  values  of  e which  were  used  in  a majority  of  the 
test  problems  were  set  equal  to  approximately  10  ^ . We  selected 
these  relatively  large  values  of  e in  order  to  expedite  the  termi- 
nation of  the  solution.  Each  computer  run  was  limited  by  three  minutes 
of  CPU  time  and  3000  printed  lines;  if  it  exceeded  these  limits 
an  interupt  would  cause  the  automatic  termination  of  the  problem. 

The  initial  values  of  r for  all  problems  were  set  equal  to  one 
except  for  the  Consumer  Expenditure  Model  for  which  we  select  r = l.E  - 07. 

In  order  to  measure  the  performance  of  various  strategies  under 

similar  conditions,  we  must  ensure  that  the  final  value  of  r is  the 

same  for  all  solution  strategies.  This  is  achieved  by  minimizing  the  penalty 

function  (P(X,r))  for  the  initial  value  of  r. 

The  evaluation  of  the  direction  of  nonpositive  curvature 
strategies  was  limited  to  three  strategies.  The  first  strategy, 
which  is  based  on  a computing  average  direction  of  nonpositive  curva- 
ture, could  not  be  evaluated  since  not  many  observations  were  possible 
among  these  test  problems.  It  is  important  to  note  that  the  average 
direction  of  a nonpositive  curvature  is  itself  a direction  of  non- 
positive curvature  when  the  estimate  of  the  positive  part  of  the 
Hessian  matrix  is  not  in  full  rank  and  each  direction  of  nonpositive 
curvature  is  equal  to  the  last  (n-t)  rows  of  Q matrix. 

7.  Numerical  Results 

In  this  section  we  present  the  results  of  numerical  experi- 
ments which  were  carried  out  for  various  strategies.  Each  test 
problem  is  identified  by  an  alphabetic  letter,  followed  by  a two 
digit  number.  The  first  digit  after  the  alphabetic  letter  identifies 
the  modified  Newton's  strategy  as  follows: 


40  - 


1  = S1MNM 


2  = S2MNM 


3  = S3MNM 


The  second  digit  signifies  the  strategy  used  for  selection  of  directions 
of  nonpositive  curvature  where 

1 = DNPC1 

2 = DNPC2 

3 = DNPC3 

4 = DNPC4 

As  an  example  B23  indicates  the  solution  of  gasoline  demand  when  the 
solution  procedure  was  strategy  2 of  modified  Newton  method  and  directions 
of  nonpositive  curvatures  were  evaluated  by  DNPC3  strategy. 

The  results  of  our  numerical  experiments  are  summarized  in  Tables 
6 through  14  and  can  be  divided  into  two  broad  categories.  In  the 
first  category  are  the  problems  where  the  solution  procedure  did  not 
generate  any  points  where  the  Hessian  matrix  was  nonpositive  definite. 

The  second  category  consists  of  those  problems  where  points  were 
generated  having  nonpositive  definite  Hessian  matrices.  Among  the  first 
group,  S3MNM  performed  best  since  the  number  of  function  evaluations 
were  considerably  less  than  the  other  two  strategies.  In  most 
instances  this  coincided  with  a reduction  in  execution  time.  However 
this  method  did  not  perform  satisfactorily  whenever  a point  with  a 
nonpositive  definite  Hessian  matrix  was  encountered.  It  seems  that 
the  algorithm  had  difficulty  in  guiding  the  moves  away  from  the  diffi- 
cult region.  This  is  indicated  by  the  large  number  of  moves  along 
directions  of  nonpositive  curvature  (see  Table  12,  Problems  D and  E) . 

This  result  does  not  fully  support  Sorenson' s result  (Sorenson 
1977)  even  though  he  used  a different  algorithm  for  finding  a point 

along  S(k)  - aS(k)  + d(k)  which  minimized  f(X(k)  + aS(k))  . Before 
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any  conclusion  can  be  reached  one  must  evaluate  Goldfarb's  algorithm 


(Goldfarb  1977)  and  compare  it  with  the  results  of  Sorenson  and  those 
contained  in  Tables  6 through  14  of  this  report.  Algorithm  S3MNM  had 
difficulties  with  overflow  problems  which  prohibited  it  from  finding 
the  solution  to  Problem  L. 

The  performances  of  strategies  one  and  two  were  very  similar 
except  that  the  algorithm  S1MNM  terminated  prematurely  on  Problem  I. 

It  is  important  to  note  that  had  we  selected  the  value  of  c equal 
to  zero  the  early  termination  would  not  have  resulted.  However,  due 
to  the  nature  of  S2MNM  which  at  each  iteration  makes  one  Newtonian 
move  and  may  or  may  not  move  along  a direction  of  nonpositive  curva- 
ture the  chance  of  premature  termination  is  practically  eliminated. 

On  the  basis  of  these  observations  we  recommend  S2MNM  as  a reliable 
and  efficient  optimization  algorithm. 

Our  preliminary  investigations  into  direction  of  nonpositive 
curvature  strategies  indicates  that  there  is  no  substantial  difference 
among  them.  The  test  problems  which  were  examined  here  did  not 
generate  many  alternative  directions  of  nonpositive  curvature.  We 
believe  as  these  directions  increase  in  number  the  selection  of  the 
"best"  direction  of  nonpositive  curvature  needs  special  care. 

However,  based  on  limited  number  of  problems,  we  recommend  DNPC2 
over  the  remaining  strategies. 


I 
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RESULT  OF  SLMNM  WITH  DNPC2 


RESULT  OF  S1MNM  WITH  DNPC3 


TABLE  8 

RESULT  OF  S1MNM  WITH  DNPC4 


1997866 


RESULT  OF  S3MNM  WITH  DNPC2 


TABLE  13 


'-384 


RESULT  OF  S3MNM  WITH  DNPC4 


T-384 


8.  Conclusions 

Several  papers  have  recently  investigated  the  problem  of  altering 
Newton's  method  when  the  Hessian  matrix  is  not  positive  definite.  The  first 
attempt  in  this  area  was  in  Chapter  8 of  Fiacco  and  McCormick  1968.  Recently 
Fletcher  and  Freeman  1975,  Gill  and  Murray  1974,  Goldfarb  1977,  and  Sorenson 
1977  have  proposed  ideas  for  this  situation.  Direct  comparison  of  their 
results  with  ours  is  difficult  because  in  most  cases  the  problems  used  were 
different  and  because  it  is  hard  to  generate  examples  on  which  comparisons 
can  be  made.  Usually  (as  seen  in  the  Tables)  only  one  or  two  cases  requiring 
modification  occur. 

The  results  of  this  study  were  inconclusive  in  many  respects.  Certainly 
the  obtaining  of  estimates  of  the  positive  part  and  directions  of  nonpositive 
curvature  operating  on  the  factored  Hessian  are  now  established  as  an  alterna- 
tive to  the  decomposition  of  the  Hessian  matrix  in  its  symmetric  n by  n 
form.  Which  of  the  basic  strategies  is  best  to  use  (S1MNM,  S2MNM,  or  S3MNM) 
and  which  direction  of  nonpositive  curvature  is  best  has  not  been  answered. 

It  does  appear  that  the  Second  Order  Armijo  Step  Size  procedure  does  not  move 
enough  in  the  direction  of  the  nonpositive  curvature  and  requires  many  more 
moves  to  minimize  a nonconvex  function.  It  does  best  in  the  positive  definite 
case  requiring  fewer  function  evaluations  and  computer  time.  This  is  misleading 
however  since  an  inefficient  optimal  step-size  procedure  was  used  for  (5.1). 

An  idea  which  came  out  of  this  research  and  may  prove  fruitful  is  a 
modification  of  the  basic  algorithm  given  in  Section  5.  When  < 0 occurs, 
instead  of  ignoring  that  rank  one  matrix,  a better  strategy  might  be  to  absorb 
as  much  of  it  as  possible  without  making  the  resulting  matrix  have  a negative 
eigenvalue.  The  formula  is  simply 
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Vi " pj  + V'VjV  ajpj  ’ 

This  modification  would  have  the  advantage  of  simplicity  and  reduce 
the  number  of  kinds  of  directions  of  nonpositive  curvature  considered. 


This  will  be  tested  in  the  future. 
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